49 research outputs found
Distributed Bayesian Probabilistic Matrix Factorization
Matrix factorization is a common machine learning technique for recommender
systems. Despite its high prediction accuracy, the Bayesian Probabilistic
Matrix Factorization algorithm (BPMF) has not been widely used on large scale
data because of its high computational cost. In this paper we propose a
distributed high-performance parallel implementation of BPMF on shared memory
and distributed architectures. We show by using efficient load balancing using
work stealing on a single node, and by using asynchronous communication in the
distributed version we beat state of the art implementations
A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem
Branch-and-Bound (B&B) algorithms are time intensive tree-based exploration
methods for solving to optimality combinatorial optimization problems. In this
paper, we investigate the use of GPU computing as a major complementary way to
speed up those methods. The focus is put on the bounding mechanism of B&B
algorithms, which is the most time consuming part of their exploration process.
We propose a parallel B&B algorithm based on a GPU-accelerated bounding model.
The proposed approach concentrate on optimizing data access management to
further improve the performance of the bounding mechanism which uses large and
intermediate data sets that do not completely fit in GPU memory. Extensive
experiments of the contribution have been carried out on well known FSP
benchmarks using an Nvidia Tesla C2050 GPU card. We compared the obtained
performances to a single and a multithreaded CPU-based execution. Accelerations
up to x100 are achieved for large problem instances
An Adaptative Multi-GPU based Branch-and-Bound. A Case Study: the Flow-Shop Scheduling Problem
Solving exactly Combinatorial Optimization Problems (COPs) using a
Branch-and-Bound (B&B) algorithm requires a huge amount of computational
resources. Therefore, we recently investigated designing B&B algorithms on top
of graphics processing units (GPUs) using a parallel bounding model. The
proposed model assumes parallelizing the evaluation of the lower bounds on
pools of sub-problems. The results demonstrated that the size of the evaluated
pool has a significant impact on the performance of B&B and that it depends
strongly on the problem instance being solved. In this paper, we design an
adaptative parallel B&B algorithm for solving permutation-based combinatorial
optimization problems such as FSP (Flow-shop Scheduling Problem) on GPU
accelerators. To do so, we propose a dynamic heuristic for parameter
auto-tuning at runtime. Another challenge of this work is to exploit larger
degrees of parallelism by using the combined computational power of multiple
GPU devices. The approach has been applied to the permutation flow-shop
problem. Extensive experiments have been carried out on well-known FSP
benchmarks using an Nvidia Tesla S1070 Computing System equipped with two Tesla
T10 GPUs. Compared to a CPU-based execution, accelerations up to 105 are
achieved for large problem instances.Comment: 14th IEEE International Conference on High Performance Computing and
Communications, HPCC 2012 (2012
Reducing Thread Divergence in GPU-based B&B Applied to the Flow-shop problem
International audienceIn this paper,we propose a pioneering work on designing and programming B&B algorithms on GPU. To the best of our knowledge, no contribution has been proposed to raise such challenge. We focus on the parallel evaluation of the bounds for the Flow-shop scheduling problem. To deal with thread divergence caused by the bounding operation, we investigate two software based approaches called thread data reordering and branch refactoring. Experiments reported that parallel evaluation of bounds speeds up execution up to 54.5 times compared to a CPU version
Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm
International audienceIn this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which arises when executing on GPU a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP which contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based execution, accelerations up to Ă—77.46 are achieved for large problem instances
Optimisation parallèle utilisant/ pour le calcul haute performance multi- et multi-core
International audienceNo abstrac
Large-scale wearable data reveal digital phenotypes for daily-life stress detection
Physiological signals have shown to be reliable indicators of stress in laboratory studies, yet large-scale ambulatory validation is lacking. We present a large-scale cross-sectional study for ambulatory stress detection, consisting of 1002 subjects, containing subjects' demographics, baseline psychological information, and five consecutive days of free-living physiological and contextual measurements, collected through wearable devices and smartphones. This dataset represents a healthy population, showing associations between wearable physiological signals and self-reported daily-life stress. Using a data-driven approach, we identified digital phenotypes characterized by self-reported poor health indicators and high depression, anxiety and stress scores that are associated with blunted physiological responses to stress. These results emphasize the need for large-scale collections of multi-sensor data, to build personalized stress models for precision medicine